A Reactive Unobtrusive Prefetcher for Multicore and Manycore Architectures
نویسندگان
چکیده
Processor performance continues to out pace memory performance by a large margin. The growing popularity of multicore and manycore architectures further exacerbates this problem. The challenge of keeping the processor(s) fed with data becomes more difficult. One approach for mitigating this gap is to employ software-based speculative prefetching. Software dynamic prefetchers are able to identify more complex patterns than hardware prefetchers, while retaining the ability to respond to dynamic program behavior. However, modern techniques incur prohibitively high application overheads to detect and to exploit these data access patterns, and do little to accommodate multicore and manycore architectures. In this work, we present an unobtrusive software prefetcher that takes advantage of underutilized cores to improve the performance of neighboring cores. We leverage multicore and manycore design to decouple the tasks of profiling, pattern detection and prefetching away from the application. Our approach takes advantage of cache coherence snooping mechanisms at the ISA level such that the cache miss patterns can be observed by a neighboring processor core. With this capability, it is possible to create a reactive solution that complements a hardware prefetcher, while isolating the tasks of pattern recognition and prefetching from altering the code or perturbing the performance of the running application. This allows our prefetching engine to be seamlessly deployed by the OS to any free core to assist neighboring cores, and terminated if those cores are needed. We call our approach unobtrusive reactive prefetching. In this paper, we outline our system, discuss our hardware extensions, and present our unobtrusive speculative hot stream extraction and prefetching algorithms for detecting and mitigating recurring cache miss patterns. Using an aggressive hardware prefetcher baseline our unobtrusive core hopping prefetcher is able to reduce the number of cache misses by an average of 26% and in our best case our technique reduces the miss rate by 84%.
منابع مشابه
Unobtrusive Reactive Prefetching: A Multicore Approach for Exploiting Hot Streams in Cache Misses
Processor performance continues to outpace memory performance by a large margin. One approach for mitigating this gap is to employ software-based speculative prefetching. Software dynamic prefetchers are able to identify patterns more complex than those of hardware prefetchers while retaining the ability to respond to a programs dynamic behavior; however modern techniques incur prohibitively hi...
متن کاملDesign of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems
Hybrid Wireless Network-on-Chip (WNoC) architecture is emerged as a scalable communication structure to mitigate the deficits of traditional NOC architecture for the future Multi-core systems. The hybrid WNoC architecture provides energy efficient, high data rate and flexible communications for NoC architectures. In these architectures, each wireless router is shared by a set of processing core...
متن کاملData Prefetching on a Manycore Architecture Case study: The XMT Platform
Multicore architectures are becoming ubiquitous in the microprocessor market today. All major vendors are pushing up the number of cores, with plans to roll out as many as 80 cores on a chip by the year 2010. The creation of manycore architectures – hundreds to thousands of cores per processor – is seen by many as a natural evolution of multicore. We present several compiler optimizations targe...
متن کاملOn the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms
Until the last decade, performance of HPC architectures has been almost exclusively quantified by their processing power. However, energy efficiency is being recently considered as important as raw performance and has become a critical aspect to the development of scalable systems. These strict energy constraints guided the development of a new class of so-called light-weight manycore processor...
متن کاملAlgorithm-level Feedback-controlled Adaptive data prefetcher: Accelerating data access for high-performance processors
The rapid advance of processor architectures such as the emerged multicore architectures and the substantially increased computing capability on chip have put more pressure on the sluggish memory systems than ever. In the meantime, many applications become more and more data intensive. Data-access delay, not the processor speed, becomes the leading performance bottleneck of high-performance com...
متن کامل